Developing ODIN: A Multilingual Repository of Annotated Language Data for Hundreds of the World's Languages
Identifieur interne : 000922 ( Main/Exploration ); précédent : 000921; suivant : 000923Developing ODIN: A Multilingual Repository of Annotated Language Data for Hundreds of the World's Languages
Auteurs : William D. Lewis [États-Unis] ; Fei Xia [États-Unis]Source :
- Literary and Linguistic Computing [ 0268-1145 ] ; 2010-09.
Abstract
In this article, we review the process of building ODIN, the Online Database of Interlinear Text (http://odin.linguistlist.org) a multilingual repository of linguistically analyzed language data. ODIN is built from interlinear text that has been harvested from scholarly linguistic documents posted on the web. At the time of this writing, ODIN holds nearly 190,000 instances of interlinear text representing annotated language data for more than 1,000 languages (representing data from >10 of the world's languages). ODIN's charter has been to make these data available to linguists and other language researchers via search, providing the facility to find instances of language data and related resources (i.e. the documents from which data were extracted) by language name, language family, and even annotations used to markup the data (e.g. NOM, ACC, ERG, PST, 3SG). Further, we have sought to enrich the data we have collected and extract knowledge from the enriched content. To enrich the data, we use a variety of statistical tagging and parsing methods applied in the English translations. An enhanced search facility allows users to find data across languages for a variety of syntactic constructions and constituent orders, facilitating unprecedented automated and online discovery of language data.
Url:
DOI: 10.1093/llc/fqq006
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000469
- to stream Istex, to step Curation: 000469
- to stream Istex, to step Checkpoint: 000267
- to stream Main, to step Merge: 000925
- to stream Main, to step Curation: 000922
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title>Developing ODIN: A Multilingual Repository of Annotated Language Data for Hundreds of the World's Languages</title>
<author wicri:is="90%"><name sortKey="Lewis, William D" sort="Lewis, William D" uniqKey="Lewis W" first="William D." last="Lewis">William D. Lewis</name>
</author>
<author wicri:is="90%"><name sortKey="Xia, Fei" sort="Xia, Fei" uniqKey="Xia F" first="Fei" last="Xia">Fei Xia</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:A068F97DB382C1195C173F6FD8944FBDEC4E9409</idno>
<date when="2010" year="2010">2010</date>
<idno type="doi">10.1093/llc/fqq006</idno>
<idno type="url">https://api.istex.fr/document/A068F97DB382C1195C173F6FD8944FBDEC4E9409/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000469</idno>
<idno type="wicri:Area/Istex/Curation">000469</idno>
<idno type="wicri:Area/Istex/Checkpoint">000267</idno>
<idno type="wicri:doubleKey">0268-1145:2010:Lewis W:developing:odin:a</idno>
<idno type="wicri:Area/Main/Merge">000925</idno>
<idno type="wicri:Area/Main/Curation">000922</idno>
<idno type="wicri:Area/Main/Exploration">000922</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a">Developing ODIN: A Multilingual Repository of Annotated Language Data for Hundreds of the World's Languages</title>
<author wicri:is="90%"><name sortKey="Lewis, William D" sort="Lewis, William D" uniqKey="Lewis W" first="William D." last="Lewis">William D. Lewis</name>
<affiliation wicri:level="1"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Microsoft Research</wicri:regionArea>
</affiliation>
<affiliation><wicri:noCountry code="no comma">E-mail: wilewis@microsoft.com</wicri:noCountry>
</affiliation>
</author>
<author wicri:is="90%"><name sortKey="Xia, Fei" sort="Xia, Fei" uniqKey="Xia F" first="Fei" last="Xia">Fei Xia</name>
<affiliation wicri:level="4"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Linguistics, University of Washington</wicri:regionArea>
<placeName><settlement type="city">Seattle</settlement>
<region type="state">Washington (État)</region>
</placeName>
<orgName type="university">Université de Washington</orgName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Literary and Linguistic Computing</title>
<idno type="ISSN">0268-1145</idno>
<idno type="eISSN">1477-4615</idno>
<imprint><publisher>Oxford University Press</publisher>
<date type="published" when="2010-09">2010-09</date>
<biblScope unit="volume">25</biblScope>
<biblScope unit="issue">3</biblScope>
<biblScope unit="page" from="303">303</biblScope>
<biblScope unit="page" to="319">319</biblScope>
</imprint>
<idno type="ISSN">0268-1145</idno>
</series>
<idno type="istex">A068F97DB382C1195C173F6FD8944FBDEC4E9409</idno>
<idno type="DOI">10.1093/llc/fqq006</idno>
<idno type="ArticleID">fqq006</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0268-1145</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract">In this article, we review the process of building ODIN, the Online Database of Interlinear Text (http://odin.linguistlist.org) a multilingual repository of linguistically analyzed language data. ODIN is built from interlinear text that has been harvested from scholarly linguistic documents posted on the web. At the time of this writing, ODIN holds nearly 190,000 instances of interlinear text representing annotated language data for more than 1,000 languages (representing data from >10 of the world's languages). ODIN's charter has been to make these data available to linguists and other language researchers via search, providing the facility to find instances of language data and related resources (i.e. the documents from which data were extracted) by language name, language family, and even annotations used to markup the data (e.g. NOM, ACC, ERG, PST, 3SG). Further, we have sought to enrich the data we have collected and extract knowledge from the enriched content. To enrich the data, we use a variety of statistical tagging and parsing methods applied in the English translations. An enhanced search facility allows users to find data across languages for a variety of syntactic constructions and constituent orders, facilitating unprecedented automated and online discovery of language data.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Washington (État)</li>
</region>
<settlement><li>Seattle</li>
</settlement>
<orgName><li>Université de Washington</li>
</orgName>
</list>
<tree><country name="États-Unis"><noRegion><name sortKey="Lewis, William D" sort="Lewis, William D" uniqKey="Lewis W" first="William D." last="Lewis">William D. Lewis</name>
</noRegion>
<name sortKey="Xia, Fei" sort="Xia, Fei" uniqKey="Xia F" first="Fei" last="Xia">Fei Xia</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000922 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000922 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= CyberinfraV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:A068F97DB382C1195C173F6FD8944FBDEC4E9409 |texte= Developing ODIN: A Multilingual Repository of Annotated Language Data for Hundreds of the World's Languages }}
This area was generated with Dilib version V0.6.25. |